A Web-Platform for Preserving, Exploring, Visualising, and Querying Linguistic Corpora and other Resources
نویسندگان
چکیده
We present SPLICR, the Web-based Sustainability Platform for Linguistic Corpora and Resources. The system is aimed at people who work in Linguistics or Computational Linguistics: a comprehensive database of metadata records can be explored in order to find language resources that could be appropriate for one’s specific research needs. SPLICR also provides a graphical interface that enables users to query and to visualise corpora. The project in which the system is developed aims at sustainably archiving the ca. 60 language resources that have been constructed in three collaborative research centres. Our project has two primary goals: (a) To process and to archive sustainably the resources so that they are still available to the research community in five, ten, or even 20 years time. (b) To enable researchers to query the resources both on the level of their metadata as well as on the level of linguistic annotations. In more general terms, our goal is to enable solutions that leverage the interoperability, reusability, and sustainability of heterogeneous collections of language resources.
منابع مشابه
Ontology-Based XQuery'ing of XML-Encoded Language Resources on Multiple Annotation Layers
We present an approach for querying collections of heterogeneous linguistic corpora that are annotated on multiple layers using arbitrary XML-based markup languages. An OWL ontology provides a homogenising view on the conceptually different markup languages so that a common querying framework can be established using the method of ontology-based query expansion. In addition, we present a highly...
متن کاملTools and Resources for Visualising Conversational-Speech Interaction
This paper describes tools and techniques for accessing large quantities of speech data and for the visualisation of discourse interactions and events at levels above that of linguistic content. We are working with large quantities of dialogue speech including business meetings, friendly discourse, and telephone conversations, and have produced web-based tools for the visualisation of non-verba...
متن کاملSearchable Metaspaces 1 Overview of Objectives
The purpose of this presentation is to start a discussion about methodological and operational requirements for developing tools for internet browsing and/or querying of meta-descriptions of language resources, in particular multimodal corpora. Among the most important requirements are: delimiting the relationship both between meta-descriptions and the resources they apply to, and between brows...
متن کاملA new Ontology Lookup Service at EMBL-EBI
The use of bio-‐medical ontologies for the annotation, integration and analysis of biological data is now well established in bioinformatics. The range and diversity of ontologies has increased dramatically over the last ten years and community efforts such as the OBO foundry have been instrumental in coordinating this activity. The demand for unified mechanisms for accessing large collections...
متن کاملZT Corpus: Annotation and Tools for Basque Corpora
The ZT Corpus (Basque Corpus of Science and Technology) is a tagged collection of specialised texts in Basque, which aims to be a major resource in research and development with respect to written technical Basque: terminology, syntax and style. It was released in December 2006 and can be queried at http://www.ztcorpusa.net. The ZT Corpus stands out among other Basque corpora for many reasons: ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Procesamiento del Lenguaje Natural
دوره 41 شماره
صفحات -
تاریخ انتشار 2008